Goto

Collaborating Authors

 quantized neural network


SearchingforLow-BitWeightsin QuantizedNeuralNetworks

Neural Information Processing Systems

However, the quantization functions used in most conventional quantization methods are non-differentiable, which increases the optimization difficulty ofquantized networks. Compared with full-precision parameters (i.e.,32-bit floating numbers), low-bit values areselected from amuch smaller set. For example, there are only 16 possibilities in 4-bit space. Thus, we present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately. In particular, each weight is represented as a probability distribution over the discrete value set. The probabilities are optimized during training and the values with the highest probability are selected toestablish the desired quantizednetwork.


Searching for Low-Bit Weights in Quantized Neural Networks

Neural Information Processing Systems

Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators. However, the quantization functions used in most conventional quantization methods are non-differentiable, which increases the optimization difficulty of quantized networks. Compared with full-precision parameters (\emph{i.e.}, 32-bit floating numbers), low-bit values are selected from a much smaller set. For example, there are only 16 possibilities in 4-bit space. Thus, we present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately. In particular, each weight is represented as a probability distribution over the discrete value set. The probabilities are optimized during training and the values with the highest probability are selected to establish the desired quantized network. Experimental results on benchmarks demonstrate that the proposed method is able to produce quantized neural networks with higher performance over the state-of-the-arts on both image classification and super-resolution tasks.


Quantum-Classical Hybrid Quantized Neural Network

Li, Wenxin, Wang, Chuan, Zhu, Hongdong, Gao, Qi, Ma, Yin, Wei, Hai, Wen, Kai

arXiv.org Artificial Intelligence

In this work, we introduce a novel Quadratic Binary Optimization (QBO) framework for training a quantized neural network. The framework enables the use of arbitrary activation and loss functions through spline interpolation, while Forward Interval Propagation addresses the nonlinearities and the multi-layered, composite structure of neural networks via discretizing activation functions into linear subintervals. This preserves the universal approximation properties of neural networks while allowing complex nonlinear functions accessible to quantum solvers, broadening their applicability in artificial intelligence. Theoretically, we derive an upper bound on the approximation error and the number of Ising spins required by deriving the sample complexity of the empirical risk minimization problem from an optimization perspective. A key challenge in solving the associated large-scale Quadratic Constrained Binary Optimization (QCBO) model is the presence of numerous constraints. To overcome this, we adopt the Quantum Conditional Gradient Descent (QCGD) algorithm, which solves QCBO directly on quantum hardware. We establish the convergence of QCGD under a quantum oracle subject to randomness, bounded variance, and limited coefficient precision, and further provide an upper bound on the Time-To-Solution. To enhance scalability, we further incorporate a decomposed copositive optimization scheme that replaces the monolithic lifted model with sample-wise subproblems. This decomposition substantially reduces the quantum resource requirements and enables efficient low-bit neural network training. We further propose the usage of QCGD and Quantum Progressive Hedging (QPH) algorithm to efficiently solve the decomposed problem.


Searching for Low-Bit Weights in Quantized Neural Networks Zhaohui Y ang

Neural Information Processing Systems

However, the quantization functions used in most conventional quantization methods are non-differentiable, which increases the optimization difficulty of quantized networks. Compared with full-precision parameters ( i.e., 32-bit floating numbers), low-bit values are selected from a much smaller set. For example, there are only 16 possibilities in 4-bit space. Thus, we present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately. In particular, each weight is represented as a probability distribution over the discrete value set. The probabilities are optimized during training and the values with the highest probability are selected to establish the desired quantized network.


Review for NeurIPS paper: Searching for Low-Bit Weights in Quantized Neural Networks

Neural Information Processing Systems

Weaknesses: 1) The similar idea of learning an auxiliary differentiable network has also been introduced in the following paper. The main difference of this paper to the following reference is that multiple bits are learned for each code in this paper while, undoubtedly, binary weights and representations will be more cost-efficient. More importantly, authors did not discuss this similar reference. IJCAI, 2019 2) I am very confused with the EQ. According to EQ. (1), The values v are discrete numbers while p is probability that the elements in W belong to the i -th discrete value.


Review for NeurIPS paper: Searching for Low-Bit Weights in Quantized Neural Networks

Neural Information Processing Systems

The paper proposes a novel end-to-end gradient-based optimization for searching discrete low-bit weights in quantized networks. After reading the reviews, rebuttal, and the discussion among reviewers the paper clearly is recognized as novel and well executed. I would encourage the authors to further improve their work by better clarifying the decay strategy for the temperature in the camera ready and to add a comparison with SGD-R scheduling as pointed out by one of the reviewers. It would be also nice to have a mention on how the proposed approach relates to Latent Weights Do Not Exist: Rethinking Binarized Neural.


Searching for Low-Bit Weights in Quantized Neural Networks

Neural Information Processing Systems

Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators. However, the quantization functions used in most conventional quantization methods are non-differentiable, which increases the optimization difficulty of quantized networks. Compared with full-precision parameters (\emph{i.e.}, 32-bit floating numbers), low-bit values are selected from a much smaller set. For example, there are only 16 possibilities in 4-bit space. Thus, we present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.


On Expressive Power of Quantized Neural Networks under Fixed-Point Arithmetic

Hwang, Geonho, Park, Yeachan, Park, Sejun

arXiv.org Machine Learning

Research into the expressive power of neural networks typically considers real parameters and operations without rounding error. In this work, we study universal approximation property of quantized networks under discrete fixed-point parameters and fixed-point operations that may incur errors due to rounding. We first provide a necessary condition and a sufficient condition on fixed-point arithmetic and activation functions for universal approximation of quantized networks. Then, we show that various popular activation functions satisfy our sufficient condition, e.g., Sigmoid, ReLU, ELU, SoftPlus, SiLU, Mish, and GELU. In other words, networks using those activation functions are capable of universal approximation. We further show that our necessary condition and sufficient condition coincide under a mild condition on activation functions: e.g., for an activation function $\sigma$, there exists a fixed-point number $x$ such that $\sigma(x)=0$. Namely, we find a necessary and sufficient condition for a large class of activation functions. We lastly show that even quantized networks using binary weights in $\{-1,1\}$ can also universally approximate for practical activation functions.


DeepNcode: Encoding-Based Protection against Bit-Flip Attacks on Neural Networks

Velčický, Patrik, Breier, Jakub, Kovačević, Mladen, Hou, Xiaolu

arXiv.org Artificial Intelligence

Fault injection attacks are a potent threat against embedded implementations of neural network models. Several attack vectors have been proposed, such as misclassification, model extraction, and trojan/backdoor planting. Most of these attacks work by flipping bits in the memory where quantized model parameters are stored. In this paper, we introduce an encoding-based protection method against bit-flip attacks on neural networks, titled DeepNcode. We experimentally evaluate our proposal with several publicly available models and datasets, by using state-of-the-art bit-flip attacks: BFA, T-BFA, and TA-LBF. Our results show an increase in protection margin of up to $7.6\times$ for $4-$bit and $12.4\times$ for $8-$bit quantized networks. Memory overheads start at $50\%$ of the original network size, while the time overheads are negligible. Moreover, DeepNcode does not require retraining and does not change the original accuracy of the model.


Frame Quantization of Neural Networks

Czaja, Wojciech, Na, Sanghoon

arXiv.org Machine Learning

Quantization is the process of compressing input from a continuous or large set of values into a small-sized discrete set. It gained popularity in signal processing, where one of its primary goals is obtaining a condensed representation of the analogue signal suitable for digital storage and recovery. Examples of quantization algorithms include truncated binary expansion, pulse-code modulation (PCM) and sigma-delta (Σ) quantization. Among them, Σ algorithms stand out due to their theoretically guaranteed robustness. Mathematical theories were developed in several seminal works [3-5, 8, 11], and have been carefully studied since, e.g., [14, 15, 19, 27]. In recent years, the concept of quantization also captured the attention of the machine learning community. The quantization of deep neural networks (DNNs) is considered one of the most effective network compression techniques [9]. Computers express parameters of a neural network as 32-bit or 64-bit floating point numbers.